Analysis of the impact of average temperature and rainfall chages on the change in the average potato price in Peru in 2006-2015 years
Produced by Fabulous Group:
Vitalii Zakhozhyi, Haotian Bu, Brady Nordstrom, Ian Langer
Contents:
Introduction & Data Summary
Introduction
Peru is the main producer of potatoes in Latin America. The yield and price of potato are highly relevant with farmers’ income. It’s important for the government to share production trend information with farmers. To achieve this goal, the government need to identify some factors that influence the price of potato. This report explores the relationship between potato price and temperature and rainfall in Peru.
Research Question
The main research question is the following:
Whether changes in average temperature and average level of rainfall affect the change in the average price on potatoes in Peru in 2006-2015 years?
Data
For the research, we used data from 2 sources. The data on the retail potato prices are from the World Food Program’s Global Food Prices data (https://www.kaggle.com/jboysen/global-food-prices). The dataset contains 743,914 observations on prices obtained in developing world markets for various goods. Data includes information on country, market, price of good in local currency, quantity of good, and month recorded.
For the climate data, we used the World Bank’s Climate Change Knowledge Portal historical data on monthly average rainfall amount and temperature in Peru from 1991 till 2015, which can be accessed using the link: http://sdwebx.worldbank.org/climateportal/index.cfm?page=downscaled_data_download&menu=historical. Both data sets on rainfall and temperature contain 1800 observations and have year and months indicated.
Our dependent variable is the retail price of potato in Peru in 2006-2015 (in Peruvian Sol per 5kg).
The independent (explanatory) variables we used for the research include: Near surface monthly mean air temperature (in °C). Monthly precipitation sums (in mm) in Peru in 2006-2015. Except missing data in Mar 2013.
Go back to the table of contents
Analysis
Importing final dataset
In this section we imported the cleaned dataset:
link = "https://raw.githubusercontent.com/vzakhozhyi/599-A-Final-Project/master/Data%20Final/DataFinal.csv"
df=read.csv(link,stringsAsFactors = FALSE)
Check the content:
str(df)
## 'data.frame': 119 obs. of 8 variables:
## $ country : chr "Peru" "Peru" "Peru" "Peru" ...
## $ crop : chr "Potatoes" "Potatoes" "Potatoes" "Potatoes" ...
## $ year : int 2006 2006 2006 2006 2006 2006 2006 2006 2006 2006 ...
## $ month : int 1 2 3 4 5 6 7 8 9 10 ...
## $ price : num 1.05 1.17 1.21 1.21 1.12 1.05 1.02 1.02 1.08 1.08 ...
## $ temperature: num 20.2 20 20.5 19.7 18.9 ...
## $ rainfall : num 233 198 209 160 134 ...
## $ date : Date, format: "2006-01-01" "2006-02-01" ...
Distribution of retail price of potato from 2006-2015:
Distribution of temerature in Peru from 2006-2015:
Distribution of rainfall from 2006-2015:
Regression
We suppose retail price of patato have a linear relationship with rainfall and temperature. Here we use Linear Regression Model to test our hypothesis:
test=lm(price~rainfall+temperature,data=df)
summary(test)
##
## Call:
## lm(formula = price ~ rainfall + temperature, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.37929 -0.18601 -0.02849 0.11291 1.04940
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0390498 0.6893982 -1.507 0.134482
## rainfall -0.0011606 0.0004757 -2.440 0.016206 *
## temperature 0.1277528 0.0363739 3.512 0.000634 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2545 on 116 degrees of freedom
## Multiple R-squared: 0.1017, Adjusted R-squared: 0.08618
## F-statistic: 6.564 on 2 and 116 DF, p-value: 0.001992
Following chart shows coefficients of two explanatory variables.
Check assumptions of Linear Regression
## [1] 91 119
The error variance changes with the level of the response since the output of the ncvTest function is non-significant, two outliers are identified:
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 2.523271, Df = 1, p = 0.11218
# collinearity?
vif(test) > 4 # problem?
## rainfall temperature
## FALSE FALSE
## StudRes Hat CookD
## 62 0.46085088 0.07717901 5.961288e-03
## 90 2.63012399 0.06446670 1.511817e-01
## 91 2.87362603 0.04276336 1.157272e-01
## 116 0.03138585 0.07235306 2.583314e-05
## 119 4.52189971 0.02896799 1.741375e-01
Those cases (rows) are not considered now:
CountrysOUT=c(62,90,91,116,119)
newtest = lm(price~temperature+rainfall,
data=df[-CountrysOUT,])
summary(newtest)
##
## Call:
## lm(formula = price ~ temperature + rainfall, data = df[-CountrysOUT,
## ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.37854 -0.17615 -0.01781 0.12746 0.62373
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.4657554 0.6535984 -2.243 0.02691 *
## temperature 0.1486177 0.0345264 4.304 3.62e-05 ***
## rainfall -0.0012066 0.0004402 -2.741 0.00714 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2214 on 111 degrees of freedom
## Multiple R-squared: 0.1461, Adjusted R-squared: 0.1308
## F-statistic: 9.5 on 2 and 111 DF, p-value: 0.0001555
Compare new regression result with before:
This plot shows predicted price and actual price:
Go back to the table of contents
Research Findings & Conclusion
The regression analysis supported our hypoethesis: other things equal, rainfall and temperature have siginificant impacts on potato retail price. On average, 1 mm increase in monthly precipitation sums is correlated with 0.00116 Sol decrease in potato price, 1 degree Celsius increase in monthly mean air temprature is correlated with 0.128 Sol increase in potato price.
Go back to the table of contents
Go to the Project Repo ___________